Method for Enriching Methylated CpG Sequences

ABSTRACT

Compositions and methods are provided for facilitating the enrichment of single-stranded DNA containing methylated CpG in a mixture containing methylated and unmethylated DNA. The compositions relate to methylation-binding protein domains that selectively bind to methylated single strand DNA. In embodiments of the invention, the methylated DNA is eluted in 0.4M-0.6M NaCl while the unmethylated single strand DNA is eluted in less than 0.4M salt. The ability to readily enrich for methylated DNA permits high throughput sequencing of the methylated DNA and identification of abnormal methylation patterns associated with disease.

CROSS REFERENCE

This application is a continuation of U.S. Ser. No. 13/722,535 filedDec. 20, 2012 which is a divisional of U.S. Ser. No. 12/608,489 filedOct. 29, 2009, now U.S. Pat. No. 8,367,331, which claims priority fromU.S. provisional application Ser. No. 61/111,499 filed Nov. 5, 2008,herein incorporated by reference.

BACKGROUND OF THE INVENTION

The task of epigenomic mapping is inherently more complex than genomesequencing since the epigenome is much more variable than the genome.While an individual only has one genome, one's epigenome varies in timeand space with age, tissue type, exposure to environmental factors, andshows aberrations in diseases especially in cancer. With methylatedCpG's only accounting for ^(˜)2-6% of the genome (18), large scaleshotgun sequencing efforts will require some form of purification ofshort CpG methylated sequences. Many current enrichment technologiesfall short of the dynamic range necessary to capture minute changes inCpG methylation that can have large repercussions in gene expression.

In the mammalian genome, 60-80% of relatively infrequent (1 per 100 bpon average) CpG dinucleotides are methylated at the carbon 5 position(1). In contrast, dense clusters of unmethylated CpG sequences (^(˜)1per 10 bp) are found at the transcription start sites of genes (2). Incertain circumstances, these CpG islands are heavily methylated with theconcomitant silencing of the promoter and the silencing of gene activity(3). These modifications are considered to be important for development(4), genomic imprinting (5), and X chromosome inactivation through genesilencing (6, 7). Aberrant DNA methylation of CpG islands has beenfrequently observed in cancer cells (8).

Many techniques exist for the enrichment of heavily methylated CpGislands from genomic DNA. One protocol relies on methylation-sensitiverestriction endonucleases such as HpaII (CCGG) and HhaI (GCGC) followedby PCR identification, Southern Blot analysis or microarray profiling(9). Another approach utilizes the ability of an immobilizedmethyl-CpG-binding domain (MBD) of the MeCP2 protein to selectively bindto methylated double-stranded DNA sequences. Restrictionendonuclease-digested genomic DNA is loaded onto the affinity column andmethylated-CpG island-enriched fractions are eluted by a linear gradientof sodium chloride. PCR, microarray, DNA sequencing and Southernhybridization techniques are used to detect specific sequences in thesefractions (10). These techniques are limited due to the specificcleavage moiety of the restriction enzyme and therefore will notcompletely reflect all combinations of bases flanking the methylated CpGdinucleotide.

There are several additional methods for analysis of methylationpatterns. In the bisulfite method, single-stranded DNA (ssDNA) isexposed to a deamination reagent (bisulfite) that converts unmethylatedcytosines to uracils while methylated cytosines remain relatively intact(11). After cleanup, the resultant treated DNA of interest must be PCRamplified (converting the uracils to thymines) and analyzed by a myriadof techniques that can distinguish between methylated and unmethylatedDNA. If the PCR products are cloned and sequenced, alignment analysis ofthe untreated and treated nucleotide sequences can reveal the in vivomethylation status of the amplified region. The PCR products can also beanalyzed by combined bisulfite-restriction analysis (COBRA assay) andmethylation-specific PCR (MSP) (12, 13).

Recently, direct shotgun ultra-high-throughput sequencing ofbisulfite-converted DNA using the Illumina 1G Genome Analyzer and Solexasequencing technology have yielded insights of the methylation state ofthe small (^(˜)120 Mbp) genome of the mustard plant Arabidopsis (14).This new technology allowed the exact identification and quantificationof 5-methylcytosines at the single-nucleotide level in genes. Althoughhighly specific and reasonably sensitive, it required at least 20-foldcoverage to theoretically cover all potential methylated cytosines.Currently, no method exists to enrich bisulfite-converted CpG methylatedDNA, which by the nature of the deamination reaction, issingle-stranded, from total genomic DNA.

SUMMARY

Methods and compositions are described herein that include theembodiments listed below.

In one embodiment, an isolated first polypeptide is provided thatincludes an amino acid sequence having at least 90% homology or identitywith SEQ ID NO:3 and is capable of binding single-stranded methylatedpolynucleotides. The first polypeptide may be fused to a secondpolypeptide and may be immobilized on a solid substrate by means of thesecond polypeptide if the second polypeptide is a substrate-bindingdomain such as maltose-binding domain (MBP). A property of the isolatedfirst polypeptide may include an ability to bind a methylated CpG in asingle-stranded polynucleotide.

Examples of the first polypeptide are human UHRFI, and mouse NP95 SRA.Either of these polypeptides may be used in series or in parallel with amethyl-binding domain (MBD), which binds double-stranded methylated DNAand thus recovery of methylated DNA may be enhanced. For example, thesample may be applied to a MBD column, eluted, denatured and thenapplied to an SRA column. Additionally, one aliquot of a sample may beapplied to an MBD column and one aliquot of sample applied to an SRAcolumn.

The above-described polypeptides either alone or as a fusion protein,either in solution or immobilized on a substrate, may be used fordifferentially binding a single-stranded methylated polynucleotide to asolid substrate, for example at a CpG site in a low salt solution.

In an embodiment of the invention, a method is provided for enrichingfor CpG methylated single-stranded polynucleotides from a mixturecontaining methylated and unmethylated polynucleotides. This methodincludes: binding the mixture to the first polypeptide described above;eluting the unmethylated polynucleotide from the isolated polypeptide ina solution containing a low concentration of a salt; and eluting themethylated polynucleotide from the isolated polypeptide in a solutioncontaining a high concentration of a salt. The eluted methylatedpolynucleotide can then be sequenced and the methylation site analyzed.

In embodiments of the invention, a low concentration of the salt is lessthan 0.4 M salt and a high concentration of the salt is 0.4 M-0.6 Msalt. The salt may be, for example, sodium chloride.

In an embodiment of the invention, a method is provided which can beapplied to determining the existence of pre-cancerous cells. The methodincludes: (a) comparing the methylation pattern for selectedpolynucleotide sequences in both pre-identified transformed eukaryoticcells and non-transformed eukaryotic cells by differential binding ofmethylated polynucleotides to the first polypeptide of claim 1; (b)determining the presence of abnormal methylation patterns associatedwith alteration of tumor suppressor function; and (c) utilizing theabnormal methylation patterns as a diagnostic tool for determiningwhether any eukaryotic cells in a sample are transformed. (In thiscontext “transformed” is intended to mean converted to a pre-cancerousstate where the cell is immortalized.)

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1C show a GST-SRA-domain resin with bound and elutedmethylated, and unmethylated dsDNA at low NaCl; and eluted methylatedssDNA at high NaCl.

FIG. 1A is a chromatogram profile at A280 of human chromatin DNA spikedwith a small amount of FAM-labeled methylated (M) and unmethylated (U)CpG-containing oligonucleotides. Both the unmethylated and methylatedoligos co-eluted with the bulk of the chromatin DNA between 0.2 M and0.3 M NaCl.

FIG. 1B shows a gel containing individual column fractions in each lane.At higher NaCl, a faint band (*) on the gel was observed correspondingto single-stranded methylated DNA.

FIG. 1C shows a side-by-side comparison of the methylated andunmethylated oligos confirming that the band (*) corresponded tomethylated CpG-containing ssDNA.

FIGS. 2A-2B show a DNA preparation with significantly altered elutioncharacteristics of the GST-SRA-domain column.

FIG. 2A is a comparison of chromatogram profiles at A280 of 100 μg ofMseI-digested HeLa DNA spiked with 3 μg of MseI digested M.SssI-labeled³H-Adomet HeLa DNA. The DNA composition was heated to 98° C. for oneminute and quickly chilled prior to loading onto the column. A largeportion of the ³H-labeled DNA eluted off the column at 0.15 M NaCl,however, three distinct peaks that eluted at 0.3 M, 0.35 M and 0.4 MNaCl were observed with a small peak of ³H-labeled DNA co-eluted withthe 0.4 M NaCl peak. The gel shows the content of each fraction.

FIG. 2B shows the same DNA load preparation, which was sonicated for 1minute followed by heating of the sample to 98° C. for 1 minute,chilled, and loaded onto the column. Three peaks were observed at 0.35M, 0.4 M and 0.45 M NaCl with the bulk of the ³H-labeled DNA co-elutedwith the 0.4 M and 0.45 M peaks, respectively. The gel shows the contentof each fraction.

FIG. 3 shows a flowchart of the procedures used to enrichsingle-stranded methylated CpG-containing DNA. Total genomic DNA wassonicated to 50-150 base fragments. The sample was heated to 98° C.,chilled and loaded onto the GST-SRA-domain column (or magnetic beads),or bisulfite-converted (which made the sample single-stranded andconverted all non-methyl cytosines to uracils) prior to loading. Thecolumn/beads were washed with buffer containing 0.3 M NaCl, which elutedthe active gene fraction. Methylated CpG-containing DNA remained on thecolumn matrix and can be eluted with 0.5 M NaCl or alternativelyequilibrated with low NaCl buffer prior to the addition of the “fourN”cloning/sequencing primer (SEQ ID NO:1). The sample was heated to 98°C., chilled to 4° C., and then slowly raised to 37° C. Sequenase wasintroduced into the reaction, allowed to extend the ssDNA fragments,heated and chilled, with more Sequenase added to label the other end ofthe DNA fragment. The defined-ends DNA was further amplified by acomplementary PCR primer without the random nucleotides, purified anddigested with BamH1, purified and cloned into a sequencing vector.

FIGS. 4A-4D show a simplified step salt gradient of GST-SRA-domaincolumn yielded reproducible elution profiles.

FIGS. 4A-4B show a comparison of two chromatogram profiles at A280 of100 μg of sonicated, heated HeLa genomic DNA FIG. 4A or 200 μg initialconcentration of sonicated, bisulfite-converted genomic DNA FIG. 4B. The0.3 M and 0.5 M fractions were characterized by qRT-PCR or cloned andsequenced.

FIG. 4C shows the bisulfite-converted fractions which were labeled andextended with a random “fourN” oligonucleotide, and PCR amplified.Ethidium-stained 20% TBE polyacrylamide gel analysis of the PCR productsbefore (−) and after (+) BamH1 treatment showed the size distribution offragments from the two peaks.

FIG. 4D shows GST-SRA-domain coupled magnetic beads only retainedmethylated (M) ssDNA lambda DNA after extensive washing with 0.3M NaClas assayed on an ethidium-stained 20% TBE polyacrylamide gel.

FIG. 5 shows active and inactive gene enrichment from GST-SRA-domaincolumn. Active genes showed at least a 2-fold enrichment over input DNAin the 0.3 M peak. Single copy inactive genes showed a directcorrelation of the fold enrichment and CpG occupancy in the 0.5 M peak.As the copy number increased, satellite and line elements showed aninverse correlation between CpG occupancy and enrichment.

FIG. 6 shows a cartoon of the UHRFI gene illustrating the location ofthe different domains in the protein. The inset shows an amino acidalignment of the SRA domains from mouse and human (SEQ ID NOS:2 and 3,respectively), revealing that the sequences are 90% identical.

FIG. 7 shows the DNA sequences of mouse and human (SEQ ID NOS:4 and 5,respectively).

FIG. 8 shows how SRA domain can be used in sequencing platforms (e.g.Helicos sequence platform) to detect methylated CpG DNA. 1. MethylatedssDNA (SEQ ID NO:6) annealed to polyT on a slide. 2. Methylated cytosinedetected by fluorescence labeled NP95 SRA domain and 3. SRA is washedoff. DNA is sequenced.

Within the flow cells, billions of single molecules of ssDNA arecaptured on a solid surface. These captured strands serve as templatesfor the sequencing-by-synthesis process. Prior to the addition ofpolymerase and one fluorescently labeled nucleotide (C, G, A or T), thecell is flooded with MBP-SRA domain protein, which binds specifically tomethylated CpG sequences. The cell is washed with a 100 mM NaCl washbuffer, and fluorescently labeled Anti-MBP antibody couples to theMBP-NP95 SRA domain/methylated CpG DNA complexes. After a wash step,which removes free Anti-MBP antibody, the cell is imaged and thepositions of the methylated CpG-containing DNA strands are recorded. Ahigh wash step (500 mM NaCl) removes the Antibody-MBP-NP95 SRA domainand the sequencing process continues with a polymerase catalyzing thesequence-specific incorporation of fluorescent nucleotides into nascentcomplementary strands on all the templates. Multiple cycles result incomplementary strands greater than 25 bases in length synthesized onbillions of templates, providing a sequence read on the methylated CpGtemplates.

FIG. 9 shows a flowchart of the procedure used to compare a commerciallyavailable methylated CpG DNA enrichment system (e.g. Invitrogen) withMBP-NP95 SRA domain. Total HeLa genomic DNA was sonicated to 50-150 basefragments. Half of the sample was heated to 95° C. for 5 minutes andchilled on ice. The other half of the sample was not heated. To 1 μg ofunheated sample, 1 μg of biotinylated (bt) MBD and buffer were added.Similarly, to 1 μg of heated DNA, 1 μg of MBP-NP95 SRA domain and bufferwere added. Both samples were incubated at room temperature for 20minutes. To the bt-MBD sample 100 μl (1 mg) of Streptavidin MagneticBeads was added. To the MBP-NP95 SRA domain sample 100 μl (1 mg) ofAnti-MBP Magnetic Beads was added. The samples were then incubatedovernight at 4° C. with rotation. The bound complexes were then washed3× with 100 mM NaCl, 1% Triton, 0.1% Tween buffer, with magneticseparation and aspiration of buffer and 1× with TE buffer containing0.1% Tween. Finally, a small quantity of water was added to theaspirated samples, and the enriched methylated DNA complexes were elutedfrom the magnetic beads by heat. The complexes were then assayed by qPCRusing primer sets to known active and inactive genes in HeLa DNA.

FIG. 10 shows the number of fold enrichment values of known methylated(inactive) and unmethylated (active) genes comparing a commerciallyavailable methyl CpG enrichment system (e.g. Invitrogen) with MBP-NP95SRA domain protein. Both techniques resulted in similar enrichment ofthe inactive genes rDNA and MYOD, with no enrichment of the active geneRPL30.

DETAILED DESCRIPTION OF EMBODIMENTS

UHRFI is a ubiquitin-like protein that improves fidelity of maintenanceof methylation and has a histone methyltransferase function. It containsmultiple domains (see FIG. 6). Two adjacent domains in the protein arenamed SET and RING and together are called the SRA domain. The SRAdomain has a sequence shown in FIG. 7. The SRA domain is capable ofbinding methylated CpG in a salt-dependent manner. In an embodiment ofthe invention, the SRA is immobilized on a matrix and can be used tobind methylated and unmethylated ssDNA or bisulfite-converted genomicDNA at low salt conditions (for example 0.15 M NaCl). The unmethylatedDNA can be eluted from the SRA protein in conditions of increased saltconcentration such as 0.3 M NaCl while methylated DNA can be eluted at0.5 M NaCl.

Human UHRFI is an example of a family of DNA-binding proteins that areassociated with regulating gene expression via methylation. Otherexamples include DNMTI and mouse NP95 SRA. This family of relatedproteins are shown here to be effective in differentiating methylatedfrom unmethylated DNA.

These proteins can be produced in high yield and are relatively stable,which makes them suitable for attaching to solid substrates such asagarose resin or carbohydrate-coated beads or magnetic beads (NEB)without loss of binding activity. The immobilized protein can easily beintegrated in a high-throughput bisufite sequencing setup. With just onewash step, mild elution characteristics, sensitivity and accuracy areenhanced. Thus, the reusable matrix provides valuable information on themethylome, providing insights into aging and disease.

There are a variety of approaches by which the SRA-like proteins can beimmobilized on a matrix. The matrix may include beads, 96 well plasticdishes, columns or any other support material. Where beads are selected,these can be magnetic, colored and/or coated with a carbohydrate orother ligand suitable for binding the SRA. To facilitate binding of theSRA-like proteins to a matrix, the SRA-like protein can be synthesizedas a fusion protein by standard molecular biology techniques inprokaryotic or eukaryotic host cells. For example, the SRA-like proteinsmay be synthesized as SRA-chitin-binding domain for binding chitin orSRA-MBP for binding to amylose. Examples of suitable fusion proteins areprovided for example in U.S. Pat. No. 5,643,758.

Other examples of fusion proteins include SRA-AGT or SRA-ACT proteins(using the SNAP-Tag® or CLIP-Tag™ technology provided commercially byNew England Biolabs). These fusion proteins can be labeled as requiredfor detection of purification of polynucleotides for example by usingfluorescent labels after covalent binding of the ACT/AGT in the fusionprotein to labeled substrates such as benzyl guanine or benzyl cytosine,leaving available the SRA to bind methylated DNA in vitro or in vivo.

The SRA may also be bound to a matrix or solid substrate such as beads,columns, glass, plastic or polymer surfaces, etc. Binding can beachieved by any ligand/ligand-binding molecule system includingantibody/antigens or biotin/strepavidin, chitin-binding domain,maltose-binding domain, etc. SRA-like proteins may be synthesized asintein fusions to facilitate certain separation methods (U.S. Pat. Nos.5,496,714 and 5,834,247).

In an embodiment of the invention, a binding preference for methylatedsingle-stranded polynucleotides by SRA-like proteins was demonstrated.This property can be exploited for detection, purification and analysisof the polynucleotides using immobilized SRA bound to the matrix. Themethylated polynucleotides can then be sequenced to identify thelocation of the methylated CpG. In another embodiment, a double strandedpolynucleotide can be bound to SRA where methylation if present can bedetected on one strand or the other.

Mammalian UHRF1 SRA domains (such as human UHRF1 or murine NP95) can beused to augment high-throughput sequencing methodologies, for example,True Single Molecule Sequencing (tSMS)™ technology (Helicos Biosciences)by binding and identifying single-stranded methylated CpG-containing DNAprior to a series of nucleotide additions and detection cycles that willthen determine the sequence of each fragment (FIG. 8). By integratingthe UHFR1-SRA domain into this instrumentation setup, additionalepigenetic information can be layered on top of rapid and inexpensiveresequencing of genomes to facilitate the understanding of methylationstates in complex organisms.

The mammalian UHRF1 SRA domains can be displaced from the polynucleotideby adding cations that neutralize the charge on the DNA and therebyrelease the electrovalently bound protein. In embodiments of theinvention, the protein binding to the polynucleotide is disrupted usingNaCl. However, the use of this salt is not intended to be limiting.Moreover, it was found that protein binds to polynucleotide atmethylated CpGs more tightly so that a high salt concentration wasrequired to release CpG methylated polynucleotides and a low saltconcentration was required to release CpG unmethylated polynucleotides.In an embodiment of the invention, the low salt concentration was 0.3 MNaCl whereas the high salt concentration was 0.5 M NaCl. Table 1provides the results of a two-step salt gradient.

Table 1 shows a sequence analysis of the two NaCl peaks from theGST-SRA-domain column. Greater than 10-fold enrichment of methylatedCpG-containing DNA was observed. 19/30 reads with an average size of 63bases in the high (0.5 M) NaCl fraction contained at least onemethylated CpG. 44/1900 bases were methylated CpG or 2.32% of the total.3/22 reads with an average size of 105 bases in the low salt 0.3M peakcontained methylated CpG. 5/2327 bisulfite-converted bases wereidentified as methylated CpG or 0.215% of the total.

All references cited herein, as well as U.S. provisional applicationSer. No. 61/111,499 filed Nov. 5, 2008 and U.S. Ser. No. 12/608,489filed Oct. 29, 2009 are incorporated by reference.

EXAMPLES Example 1 SRA-Domain Protein Purification and the CovalentCoupling of the Protein to Solid-State Matrixes

The SRA domain (386-618) was amplified from full-length human UHRF1 cDNAsynthesized using total RNA from HeLa cells. The product was cloned intopENTR-TEV (GST Tag Invitrogen) and recombined into pDEST15 (Invitrogen,Carlsbad, Calif.) to create the GST fusion. The construct was propagatedin T7 Express E. coli (NEB) to an OD 590 of 0.5 at 37° C. and inducedwith 0.1 mM IPTG overnight at 16° C. Cells were spun, broken open byFrench press, spun again and the supernatant layered over a 10 mlGlutathione Separose High Performance column (GE Healthcare). After a10-column wash, the protein was eluted with a 10 mM L-Glutathione(Sigma) solution. The yield was 12 mg total of purified SRA-domain from8 liters shake flasks.

GST-SRA Column

9 μls of 1.2 mg/ml (10.8 mg total) of previously purified and dialyzedGST-SRA-domain protein in 10 mM Tris pH. 7.5, 1 mM EDTA and 0.2 M NaClwas layered onto a 4.5 ml Glutathione Sepharose matrix equilibrated withthe above buffer. Of the 10.8 mg load, 7.83 mg remained bound to thecolumn. The resin was washed with 10 column volumes of the above buffer,then cycled twice with the above buffer supplemented with 1 M NaClbefore final equilibration at 0.05 M NaCl. Sequences of the methylatedoligonucleotides were FAM-GTAGG5GGTGCTACA5GGTTCCTGAAGTG top strand (SEQID NO:7), FAM-CACTTCAGGAAC5GTGTAGCAC5GCCTAC bottom strand with 5=5methyl cytosine. Sequences of the unmethylated oligonucleotides wereGTCACTGAAGCGGGAAGGGACTGGCTGCTCCCGGGCGAAGTGCCGGGGCAGGATCT-FAM top strand(SEQ ID NO:8),AGATCCTGCCCCGGCACTTCGCCCGGGAGCAGCCAGTCCCTTCCCGCTTCAGTGAC-FAM bottomstrand.

qPCR Analysis of NaCl Fractions from GST-SRA-Column

DNA from the high and low salt fractions were characterized by real-timePCR on a Bio-Rad MyiQ iCycler using Bio-Rad iQ SYBR Green Supermix andthe following primer sets: hsALDOA TCCTGGCAAGATAAGGAGTTGAC forward (SEQID NO:9), ACACACGATAGCCCTAGCAGTTC reverse (SEQ ID NO:10), hsSERPINAGGCTCAAGCTGGCATTCCT forward (SEQ ID NO:11), GGCTTAATCACGCACTGAGCTTAreverse (SEQ ID NO:12), hsRPL30 CAAGGCAAAGCGAAATTGGT forward (SEQ IDNO:13), GCCCGTTCAGTCTCTTCGATT reverse (SEQ ID NO:14), hsRASSF1TCATCTGGGGCGTCGTG forward (SEQ ID NO:15), CGTTCGTGTCCCGCTCC reverse (SEQID NO:16), hsMYO-D CCGCCTGAGCAAAGTAAATGA forward (SEQ ID NO:17),GGCAACCGCTGGTTTGG reverse (SEQ ID NO:18), hsMYT1TGAAACCTTGGGTGTCGTTGGGAA forward (SEQ ID NO:19),TTGCGGGCCATTGTTCCATGATGA reverse (SEQ ID NO:20), rDNACGTACTTTATCGGGGAAATAGGAGAAGTACG forward (SEQ ID NO:21),GTGCTTAGAGAGGCCGAGAGGA reverse (SEQ ID NO:22), hsSATATCGAATGGAAATGAAAGGAGTCA forward (SEQ ID NO:23), GACCATTGGATGATTGCAGTCAreverse (SEQ ID NO:24), LINE CGGAGGCCGAATAGGAACAGCTCCG forward (SEQ IDNO:25), GAAATGCAGAAATCACCCGTCTT reverse (SEQ ID NO:26). Cycle programwas as follows: cycle 1: (1×) 95° C., 5 minutes, cycle 2 (40×) step 1:95° C. 10 seconds, step 2: 61° C. 30 seconds, step 3 72° C. 30 seconds.

Cloning and Sequencing of NaCl DNA Fragments from GST-SRA-Column

Eluted and de-salted DNA fragments were cloned into BamH1 cut andalkaline phosphatase (CIP) treated LITMUS 28i cloning vector using the“fourN” procedure (17) with the exception of the sequence of theoligonucleotide: GTTTCCCAGTCAGGATCCNNNN (SEQ ID NO:1) and PCR primerGTTTCCCAGTCAGGATCC (SEQ ID NO:27). PCR products were purified usingQiagen columns cut with BamH1, purified again, ligated to the vector andcloned as stated.

Results GST-SRA-Domain of Human UHFR1 Coupled to a Solid Matrix EnrichedSingle-Stranded Methylated CpG-Containing DNA

To determine the preference of the SRA-domain for unmethylated, fullymethylated or hemi-methylated double-stranded or ssDNA in a solid statematrix, the following experiment was performed. 7.83 milligrams ofpurified GST-SRA domain was bound to a 4.5 ml GST column. 1.68milligrams of MNase digested chromatin (^(˜)150-1000 bp) from humanJurkat cells spiked with 1 μg each of fluorescein (FAM)-labeleddouble-stranded methylated CpG oligonucleotide and unmethylated CpGoligonucleotide of different sizes were layered onto the column inbuffer A (10 mM Tris pH. 7.5, 1 mM EDTA, 0.05 M NaCl). After a 10 volumecolumn wash with buffer A, the column was developed with a 100 ml NaClgradient to 1 M and the fractions were assayed by gel electrophoresis(FIGS. 1A-1C). Both the methylated and unmethylated DNA oligos co-elutedwith the bulk of the chromatin DNA between 0.2 M and 0.3 M NaCl.Interestingly, a faint fluorescent band that was smaller than the twoannealed oligos was eluted off the column at ^(˜)0.4 M NaCl. It wasspeculated that this band might contain unannealed methylated ssDNA.

To further investigate the binding preferences of the SRA-domain resinfor ssDNA, 100 μg of MseI-digested HeLa DNA spiked with 3 μg ofMseI-digested M.SssI-labeled ³H-Adomet HeLa DNA was applied to the aboveequilibrated GST-SRA domain column. After column wash in buffer A, a 30ml step gradient from 0.1 M to 0.6 M NaCl was initiated and fractionscollected. The double stranded DNA and the ³H-labeled fully methylateddouble-stranded DNA eluted off the column in the first two fractions at0.15 M NaCl. Next, another DNA preparation of the same composition washeated to 98° C. for 1 minute and quickly chilled on ice for 5 minutesprior to loading on the equilibrated column. The above step gradient wasused to elute the DNA and the fractions were analyzed as before. A largeportion of the ³H-labeled DNA eluted off the column at 0.15 M NaCl;however, three distinct peaks that eluted at 0.3 M, 0.35 M and 0.4 MNaCl were observed with a small peak of ³H-labeled DNA co-eluted withthe 0.4 M NaCl peak. Finally, a third DNA load preparation was sonicatedfor 1 minute followed by heating of the sample to 98° C. for 1 minute,chilled, and loaded onto the column. Three peaks were observed at 0.35M, 0.4 M and 0.45 M NaCl with the bulk of the ³H-labeled DNA co-elutedwith the 0.4 M and 0.45 M peaks, respectively (FIGS. 2A and 2B). It wasconcluded that sonication plus heating of the sample fully fractionatedthe genomic DNA into a single-stranded form that facilitated binding ofthe DNA to the resin and greatly improved the resolving power of thematrix to discriminate between unmethylated and fully methylated CpGDNA.

Simplified Elution Profile Enriched Active and Inactive Genes

A new DNA preparation containing 100 μg of sonicated, heated HeLagenomic DNA was layered onto the above equilibrated column in buffer A.To simplify the elution protocol, a 0.15 M wash step and a 0.3 M and 0.5M elution steps were employed. Fractions containing the 0.3 M and 0.5 Mpeaks were collected, desalted and concentrated using a Qiagen miniprepcolumn (FIG. 3 flow chart and FIGS. 4A-4D). The products from the saltfractions were characterized by qPCR on a BioRad iCycler using primersto known active and inactive genes in HeLa cells (FIG. 5). The activelytranscribed genes Aldolase A (ALDOA), serpin peptidase inhibitor(SERPINA) and 60S ribosomal protein L30 (RPL30) showed a consistenttwo-fold enrichment in the 0.3 M peak over input DNA. The high saltpeak, presumably containing the inactive gene fraction, revealed littleor no enhancement of these genes.

Six known repressed areas of the HeLa genome were interrogated in asimilar fashion. Single-copy genes RAS association domain family protein1 (RASSF1), myogenic differentiation 1 (MYO-D), and myelin transcriptionfactor 1 (MYT1) as well as tandem repetitive ribosomal DNA (rDNA) showeda direct correlation of fold enrichment and CpG occupancy in the 0.5 Mpeak. Highly repetitive satellite DNA (hsSAT) showed less enrichment inthe high salt peak. In spite of high CpG content, long interspersednuclear (LINE) elements that are transcribed by RNA polymerase II intomRNA (16) showed little difference between the low and high saltfractions, suggesting that the SRA-domain column may accurately reflectthe extent of methylation of these sequences in the genome.

Random Sequencing of Cloned Fragments Derived from NaCl Eluted Fractions

Sodium bisulfite conversion of genomic DNA, while highly degrading as aconsequence of the reaction, can yield very high-resolution informationabout the methylation state of a given segment of DNA. As the SRA-domainresin favored fragmented ssDNA, it was ideally suited to bind andresolve bisulfite-converted DNA. To explore the characteristics of theSRA-domain column when bisulfite DNA is applied, 200 μg of HeLa genomicDNA converted by the Epitect Bisulfite Kit (Qiagen) was applied to theequilibrated column, washed and eluted as before. As in previous runs,two peaks were observed at the 0.3 M and 0.5 M NaCl step elutions.Fractions were collected, concentrated and de-salted by Qiagen columns.Cloning of the fragments was accomplished using a modification of the“fourN” procedure (17) in which a small oligonucleotide containing fourrandom bases followed by a BamHI restriction site were annealed to thefragments at both ends and extended with Sequenase. Primerscomplementary to known sequences introduced during the random primingreaction were added and a PCR reaction amplified the products. Aftercleavage with BamHI restriction enzyme, the DNA was cloned into a BamHIlinearized Litmus 28i vector and plated on AMP/IPTG/XGAL plates (FIG. 3flow chart).

The DNA from 100 white colonies of the 0.5 M peak and 50 colonies of the0.3 M peak were submitted for sequencing. Of those 100 reads from the0.5 M peak, 30 were deemed suitable for analysis by the followingcriteria: 1) Contained viable sequences that could be identified by NCBIBlastN as human; 2) Showed evidence of non-methyl cytosine conversion (Cto T or G to A, depending on orientation); and 3) unconverted C that wasfollowed by G or unconverted G followed by C, again depending on forwardor reverse sequencing orientation. Out of these 30 reads (Table 1) withan average size of 63 bases, 19 contained at least one methylated CpG.Of the 1900 bases sequenced, 44 were methylated CpG or 2.32% of thetotal. Amazingly, out of the 19 methylated CpG sequences, 10 mapped toknown CpG methylation sites: nuclear receptor subfamily 4 (19), Fanconianemia (20), von Willebrand factor (21), coagulation factor XIII andtransglutaminase (22), chromodomain protein Y-like (23), spectrin repeat(24), HECTD1 (25), zinc finger and BTB domain containing 46 (26), andpumilio (27). Out of 22 reads with an average size of 105 bases in thelow salt 0.3M peak, 3 contained methylated CpG. Of these 2327bisulfite-converted bases, 5 were identified as methylated CpG or 0.215%of the total. Although limited in scope, these data showed a better than10-fold enrichment of methylated CpG from the high NaCl peak versus thelow NaCl peak. Additional sequencing efforts will be required to fullydetermine the potential fold enrichment by the SRA-domain resin ascompared to random sequencing of genomic DNA or to CpG methylated DNAthat was augmented by other means such as an MBD column.

GST-SRA-Domain Protein Covalently Coupled to Magnetic Beads ShowedSimilar Binding and Elution Characteristics

An alternative to column chromatography, GST-SRA-domain proteincovalently coupled to a nonporous paramagnetic particle was tested forits suitability as a high-throughput purification matrix for methylatedCpG sequences. To compare the binding characteristics of theGST-SRA-domain magnetic beads, 5 μg of sonicated unmethylated lambda DNAor 5 μg of sonicated fully enzymatically methylated (M.SssI) lambda DNAwas added to a 50 μl of a 50% slurry of 10 mg/ml SRA-domain magneticbeads in 150 mM NaCl, 0.1% Tween 20, 10 mM Tris pH 7.5, and 1 mM EDTAand allowed to mix end over end for 30 minutes at room temperature. Thetubes were placed on a magnetic separation rack and the supernatant wasaspirated. The samples were washed and magnetically separated threetimes by the above buffer supplemented with 150 mM NaCl. The beads werethen loaded directly on a 20% native TBE acrylamide gel for analysis.Similarly, sonicated methylated and unmethylated lambda DNA samples wereheated to 98° C. and chilled prior to binding on the magnetic beads,followed by washes as stated above. Based on the ethidium stained DNAgel, it was determined that only the methylated heated lambda DNAremained on the beads after the 0.3 M NaCl washes (FIGS. 4A-4D).Additional work is needed to characterize the DNA fragments that remainbound to the beads by direct linker addition and DNA sequencing.

Example 2 Common Properties Shared by Sra Domains from Different Sources

MBP-NP95 SRA-domain fusion protein effectively enriched single-strandedmethylated CpG DNA using a small amount of input DNA. This wasdemonstrated as described below.

The SRA domain of mouse NP95, which is 90% identical to human UHRF1,bound and enriched fragmented methylated ssDNA using 1 μg of input DNA.In addition, mouse NP95 SRA domain purified methylated CpG-containingDNA by 20-25 fold from 1 μg of fractionated ssDNA, and was comparable tomethyl binding domain in yield and sensitivity.

An alternative to column chromatography, a MBP-NP95 SRA-domain fusionprotein in conjunction with Anti-MBP monoclonal antibody coupled to aparamagnetic bead was tested for its suitability as a high-throughputpurification matrix for methylated CpG sequences. To compare the bindingand elution characteristics of the NP95 SRA-domain with a commerciallyavailable methylated CpG enrichment system employing biotinylated MBD(MethylMiner™ Methylated DNA Enrichment Kit from Invitrogen), 1 μg ofsonicated, heated HeLa DNA (NP95 SRA) and 1 μg of sonicated HeLa DNA(MBD) was added to 1 μg of MBP-NP95 SRA (15 μl) or 1 μg of biotinylatedMBD (2 μl), in a 200 μl total reaction mix containing 20 μl 10× NEBuffer4 (50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate,1 mM dithiothreitol pH 7.9) and 2 μl 100 μg/ml BSA was incubated for 30minutes at room temperature. To the MBP-NP95 SRA reactions, 100 μl (1mg) of Anti-MBP magnetic beads (NEB) was added. To the MBD reactions,100 μl (^(˜)1 mg) of streptavidin magnetic beads (Invitrogen) was added.Both reactions were allowed to mix end over end overnight at 4° C. Thetubes were placed on a magnetic separation rack and the supernatant wasaspirated. The samples were washed and magnetically separated 3× by 15ml of wash buffer (20 mM Tris-HCl pH 7.5, 100 mM NaCl, 1 mM EDTA, 1%Triton X-100, 0.1% Tween 20) followed by a final 15 ml wash in low saltbuffer (20 mM Tris-HCL, 1 mM EDTA, 0.1% Tween 20 (see FIG. 9). 140 μl ofwater was added to the bead complexes and the DNA samples were heated to98° C. to liberate the enriched methylated DNA. The products from thisheat step were characterized by qPCR on a BioRad iCycler using primersto known active and inactive genes in HeLa cells. The activelytranscribed gene ribosomal protein L30 (RPL30) showed no enrichment inthe MPB-NP95 SRA samples or the bt-MBD samples. The methylated genesmyogenic differentiation 1 (MYO-D), and tandem repetitive ribosomal DNA(rDNA) showed a 20-25 fold enrichment in MPB-NP95 SRA samples, and iscomparable to the enrichment values in the bt-MBD samples (FIG. 8).Additional work is needed to characterize the DNA fragments that remainbound to the beads by direct linker addition and DNA sequencing.

TABLE 1 High Salt 0.5M (enriched) peak, no CpG 11-33.5 TGTGGGGTTGTTGTTTTGAGAGGGTTTTTTTTTGGGGTTTTTATTAATGATG (SEQ ID NO: 79)6-33.5 AAACATTGGGAATATAGTATTTATTTTTGGTGATTATGTGTTTAGTTAAGTATTAGAGGATATTTTTA (SEQ ID NO: 28)7-33.5 AATTTTTGTAGTTTTAGTAGAGATGGAGTTTTATTATGTTGGTTAGGTTGG (SEQ ID NO: 29)8-33.5 GAAACAGGAGAATTTTTTGAATTTGGGTGGTAGAGG (SEQ ID NO: 30)9-33.5 AGAAAATATGGTTTGTTAATGAATGATAGGTTAATTTTAGTATGTTGGTTATTTTAATATTTTGTTATTAGTTGGTTTGG(SEQ ID NO: 31)H19-33.5 CAGGTATAGTGGTAAGAATTTGTAGTTTTAGTTATTTGGGAGGTTGAGTTAGGA (SEQ ID NO: 32)H76-33.5 AAACTTTTGGTTGGGGGTGGTGGTTTATGTTTGTAATTTTAGTATTTTGGGAGGTCAAGGTGAGTGGAT(SEQ ID NO: 33)H2-33.5 AGGTAGTTTTATTTTGGGTTTTAGGGAATAGGAGGGAATTAGAAGGA (SEQ ID NO: 34)H5-33.5 CAGTATTTTGGGAGGTTAAGGTAGGTGGATTATGAGGTTAGGAGATTGAGA (SEQ ID NO: 35)H21-33.5 GATGGATTGTTTGAGTTTAGGAGTTTGAGATTAG (SEQ ID NO: 36)H24-33.5 TGAGTTTAGTTTAAGTTGATTGGGTAGGTAAATGTTTGTTATGAATTTGGAAGTGAGAGA(SEQ ID NO: 37) High Salt 0.5M (enriched) peak, CpG3-33.5 725439 bp at 3′side: nuclear receptor subfamily 4, group A, member 2 isoform aCAGGTGTTGAGTGGTGAGGGATGTGTAAATAAGTAAGTGTGGGGTTCGGTTATTGCGTATAGTTAGGTATATTGGTTGTTGTGGGGTGGGGTAGGTAATTTAAGTATTAGTATGGGTATTGGTTTTTTGTGAGGC (SEQ ID NO: 38)4-33.5 Fanconi anemia, complementation groupMACAAAAATTAGTTAGGTATAGTGGTATGTATTTGTAGTTTTAGTTAATCGGGATCCTGA (SEQ ID NO: 39)5-33.5 GENE ID: 10692 RRH | retinal pigment epithelium-derived rhodopsinhomolog GAATGGCAAGTATTGGATTATTTACGGTCGTGGTTGTGGATCGATA (SEQ ID NO: 40)10-33.5 transglutaminase 2 isoform bAGTTTGTACGGTGAAGTTTAGGTTTTATTGTGGATACGGTTGAAATAGAAGAGTGATGGG (SEQ ID NO: 41)H6-33.5 31781 bp at 5′side: von Willebrand factor preproprotein 46059 bp  at 3′side: CD9 antigenTGAACGCGGGAGGCGGAGTTTGTAGTGAGTTAAGATCGCGTTATTGTATTTTAG (SEQ ID NO: 42)H7-33.5 ref|NW_001838799.1|H52_WGA192_36GGAAACGAATGAAATTATCGAATGGAATCGAATGGTGTTATCGAACGGA (SEQ ID NO: 43)H12-33.5 coagulation factor XIII A1 subunit precursorCGGATAGGAGGGGTTGTTATGAAG (SEQ ID NO: 44) H15-33.5 545337 bp at 5′side: EGF-like repeats and discoidin I-likedomains-containing TAGTTAATTATATGTGTTCGTTATTTGTGTATGTGG (SEQ ID NO: 45)H45-33.5 114563 bp at 5′ side: similar to hCG2036843ATGAAAGTGTTTTGGGGATGGATGGGGGATATGGTTGTATAATGTGGCGGACG (SEQ ID NO: 46)H55-33.5 B-cell novel protein 1 isoform aAGAATCGTTTGAGTTTAGGAGTTTAAGATTAGTTTGGGTAATATAGTGAGATTTTGTTGTTACGAAAATAAATAAAAAAT TAGTTAGGTGTGGTGGTGTATGTTTGTGGT (SEQ ID NO: 47)H64-33.5 17408 bp at 5′ side: musashi 2 isoform bTGTTTGTTGAGTGTACGTNTNNNGTATTTGTGTTGGGTGTATGTGGATGTGTGNGNTGAG(SEQ ID NO: 48)H74-33.5 Homo sapiens HECT domain containing 1 (HECTD1), mRNAAGTTTGAAGTTTTTATAGAAGAAGGTTATGATTTATTTTCGGTAGGAAGTTTTGAAGAG(SEQ ID NO: 49) H15a-33.5 62438 bp at 5′side: D-amino acid oxidase activatorAGGAAAGTTGGAAGGATGAGGATAACGTAGTGTTTTGTTGAAGAAGGAAGAGANNNNGGATTAAATTGAAATTGATTGGG TTTYTAAAATGGATGGGAT (SEQ ID NO: 50)H27-33.5 unc-51-like kinase 4 AGTTTGATTTTAGATTGTTGTGTTAGTAATGAGCGAGG (SEQ ID NO: 51)H30-33.5 spectrin repeat containing, nuclear envelope 2 isoform 1TTATTTTTATAAAAATAAAAAAATTAGTTGGGTGTAGTGGCGTATGTTTGTNGTTTTAGT (SEQ ID NO: 52)H H31-33.5 256834 bp at 5′ side: alpha 1 type IV collagenpreproprotein AACGATAAAGAAAATAAAAGGAGTGAGGGAGGATAGATGGG (SEQ ID NO: 53)H35-33.5 pumilio 1 isoform 1ATTAGTTAGGCGTGGGGGTGGGTGTTTGTAGTTTTAGTTATTTAGGAGGTTGAGGTAGGA (SEQ ID NO: 54)H7a-33.5 zinc finger and BTB domain containing 46AAGGTGGGGGTTGGGGGGNTNGTTTTTTCGGGNTGTTGTCGCGGNGGAGGAGCGTTTTAGAGTTTACGGCGTAGTTTTATTCGTCGGNATTTAGGTGGACGTTGATCGGGGGAGAGAATTGAGTATCGGGATC(SEQ ID NO: 55) H9-33.5 259088 BP AT 3′SIDE: CHROMODOMAIN PROTEIN, Y-LIKE 2AGAGTAGAGAGATGATTAAATTTATGTTAATTTTATTATTTTGGTTTTGAGGTTGTTGTRYAAGTTTTTTAGAATGTGAGTCGGGTATTGTTTTTGAGGTTAACGTTATTTGGTTTGCGTTT (SEQ ID NO: 56)Low Salt 0.3M(control) peak, CpG13-33.3 GGGAGGTAGTGATGAGAGTAATAGATAGGGTTTAGGTGTTTGTGTATGATATGTTTG (SEQ ID NO: 57) L9-33.3GATGTTATTAAATAATTAGATTATTTGTATTCGAATTGGGTAAGTAGTATAAAGGANAANGATATTATTAAATAATTAGACTATTTGTATTCGAATTGGGTAAGTAGTACAAAGGAGAAGTGGGGNAA(SEQ ID NO: 58)3-2-33.3 19744 bp at 3′ side: Myc-binding protein-associated proteinTTTGTAGAAGGATGTGAGAGGAGAAGTGAGCGGTTTTATAGGTATGATGTTAGTTATAAGGGGTTGGTGAGTTGATGTGGGAGGATTATTTG GTTTAGGAGTTTAAGGTTGCGGTGAGT (SEQ ID NO: 59)L-17.33 dihydrouridine synthase 3-likeTGAGGGTTGGGTTTAGGATAGAGTATAGAGAGGGAGATTTAGTTAGGAGTTTTTTTAAGGTATATAGTTTTTGATTTTTAGGTAGTTAGAATAGGAACGTGGATATAGTTGGTATTTAATAGACGTATATTAGATGGATAGATTTGTTATTGA (SEQ ID NO: 60) Low Salt 0.3M(control) peak, no CpG 3-5-33.3TAGTAGTATGATGTTAGTTTTTTTTAAATTATAGATTCAATAAAATTCAGTTAAAATTTTATTAGTTTTATTTATTTATTGATTTAGTAGAGATGGATATAGTACTGT (SEQ ID NO: 61) 3-6-33.3GTGTTATCGTATTGGGGTTATTTGTGTAATTAATATGTGTTATTTAGTTTTAGGGTGTATGTTTATTGTTTTAATTATGATGGAGGTGTAGTTTGGAGATTTTGTGTTAGGAGATTAGTAGAGTTTGGGGTTTTAAGGGGATTTTTTGTGGGGGAGAGGGATAGTTGTGTAGTAGAGTGATAATGAAGGTTTTTGATTTAATGTGTAGTTTTTAGGTTATGTGT (SEQ ID NO: 62)3-8-33.3 TTTGGGAGGTTGAGGTGGGTAGATTATGATGTTAAGAGATTGAGATTAT(SEQ ID NO: 63)L1-33.3GATGAAAGGTTAAAAATTGAGATAGAAGATGTGATTTGGAAGGTTATAAGAGAAGTTGGATAAAGTTAAATAAGGAAAGGAATTTAGAAAAAAGTGTTTAATGTTGTAGAAGG (SEQ ID NO: 64) L1-19.3CTATTCTTCCCATTCTCAACATAACTCTAACCTTCCTTCATCCTCACACCCAACAATCATTCACTCATTTATCTA (SEQ ID NO: 65) L-1.33GATAAAGTTGTGNGTAGGGATTTTTGGTAGAGGGAATAGAAAGATGGAGGTGTTGAGGTAGGAGTGATGGGTAGGTTTGAAGAGTAGAGTTTAGTGTAGTGAGGGGGTTATTAGTAAGGG (SEQ ID NO: 66) L-11.33ATATTTTATGGAGGAGTAATTTTTAGAGTATATGAATTGGTTTTATGGAGGAAGATTGTTATTTATAGGTTGGTGTAAGTGATGGTAGTAGTGGTTTGTC (SEQ ID NO: 67)L-12.33 AGAAGATAAGGAGAAGATAATTATTNTTTTGGTAGAGGTAATTGATTTGATTATTAGGA(SEQ ID NO: 68)L-15.33 ATGTGTATTTAAAGTAAGGTTATGAGATTTTGGATTGTTTTTTGTTTAGGATGATATGTG(SEQ ID NO: 69) L-16.33 AAGTAAAATAATTTTGTTTTTATTTATTTTANAGGATTGTT(SEQ ID NO: 70) L-18.33AAAATTTTAAGATTAGGTAAAAATATTGTGTAAAGTGAGAGGGATGTGATGGTTAAAAAGTGATTTAAGATTTTTGTAATTTTTAGTTATAATTTAAGA (SEQ ID NO: 71) L-2.33GAGATAATAGTGAGTATGATATTTTTTGTTTTTTTTATTATGTGTTAAGTATTGTTTAGGGATTAAGTGGGGTTGTGTTTATTGTAGATGTTGTAGGTATGGAGTTAGTA (SEQ ID NO: 72) L-20.33ATGTATTTAGTTGTTTATTGAATATTATTTTAATATTGTATTATGAATATTGTTATGTTATGGATTTTAGGTTTTATTAGATTGGTATTAGTATCATTTAGGAATATTTTATGATGTGTGTTGATAAATTTTTAAGATAAATGAATTTGAGATATGTGTGAGTATTTTATAAAATAAATTTTGTTGGA (SEQ ID NO: 73)L-23.33 ATGGTTTGTTTGTTTTTGTGGAAAATGGTATGAAGATTGGGTTTGTATTGAATTTG (SEQ IDNO: 74) L-24.33TGTAGTTTTAGTTATTTAGGAGGTTGAGATATGAGAATTATTTGAATTTGGGGGGGGAAGGTTGTAGTGA(SEQ ID NO: 75) L-27.33TGAGAAGGGGGTAGTGGGGATGGTTTTGTGGGTTTATGTTGTTTTTGATTTTAGAAAATAAAGTTTTTTGTAGGAAGTAGGTGGGAAGTAATTTGTTGATAAGTGTAAAGATTTGGGAATTATATTAAGGGGTAAATGGAGGANAGGTGTTGGTGTTAANGAGGTAGACNTATGGGAGTTNGGTTTTAGGAANGGNNGTGGNTAGAAAGG((SEQ ID NO: 76)L-28.33 GGTAGGTAGATTATTTGAGGTTAGGAGTTTAAG (SEQ ID NO: 77) L-4.33ATATTTTTTTATTGAAGAATGTAGTTTTTTAAAATTAAAATGTATTTTTAAAATTTATTTATTATTTTTT--GAGATAAGGTTTTGTTTTGTTGTTTAAGTTAGAGTATAGTATGTGATTATAGTTTATTGTAGTTTTGAATTTTTGGGTTTAAG (SEQ ID NO: 78)

Table 1 above shows the results of sequence analysis of the two NaClpeaks from the SRA-domain column showed a better than 10-fold enrichmentof methylated CpG DNA. Out of 30 reads with an average size of 63 basesin the high (0.5 M) NaCl fraction, 19 contained at least one methylatedCpG. Of the 1900 bases sequenced, 44 were methylated CpG or 2.32% of thetotal. Out of 22 reads with an average size of 105 bases in the low salt0.3M peak, 3 contained methylated CpG. Of these 2327 bisulfite-convertedbases, 5 were identified as methylated CpG or 0.215% of the total.

1.-16. (canceled)
 17. A composition comprising: a first polypeptidecomprising a sequence having at least 90% amino acid sequence homologywith SEQ ID NO:3; and a mixture containing methylated and unmethylatedpolynucleotides, wherein the polynucleotides are single-stranded. 18.The composition of claim 17, further comprising a second polypeptidefused to the first polypeptide.
 19. The composition of claim 17, whereinthe first polypeptide is immobilized on a solid substrate.
 20. Thecomposition of claim 18, wherein the second polypeptide is asubstrate-binding domain.
 21. The composition of claim 20, wherein thesecond polypeptide is maltose-binding protein.
 22. The composition ofclaim 17, wherein the first polypeptide is selected from the groupconsisting of: human UHRF1 and mouse NP95 SRA.
 23. The composition ofclaim 17, further comprising a low concentration of salt.
 24. Thecomposition of claim 23, wherein a low concentration of the salt is lessthan 0.4 M salt.
 25. The composition of claim 17, further comprisingsalt at a concentration of 0.4 M-0.6 M salt.
 26. The composition ofclaim 25, wherein the salt is NaCl.
 27. The composition of claim 17,wherein the methylated polynucleotides contains hemi-methylated CpG. 28.A method, comprising: (a) comparing the methylation pattern for selectedpolynucleotide sequences in both pre-identified immortalized eukaryoticcells and non-immortalized eukaryotic cells by differential binding ofmethylated polynucleotides to the first polypeptide of claim 17; (b)determining the presence of abnormal methylation patterns associatedwith alteration of tumor suppressor function; and (c) utilizing theabnormal methylation patterns as a diagnostic tool for determiningwhether any eukaryotic cells in a sample are immortalized.
 29. Themethod according to 28, wherein the methylated polynucleotide containshemi-methylated CpG.
 30. The method according to claim 28, wherein step(a) further comprises forming single-stranded DNA for differentialbinding of the hemi-methylated CpG-containing polynucleotide.